The Celerity High-level API: C++20 for Accelerator Clusters
نویسندگان
چکیده
Abstract Providing convenient APIs and notations for data parallelism which remain accessible programmers while still providing good performance has been a long-term goal of researchers as well language library designers. C++20 introduces ranges views, the composition operations on them using concise syntax, but efficient implementation these features is restricted to CPUs. We present Celerity High-level API, makes similarly mechanisms applicable GPUs accelerators, even distributed memory clusters GPUs. Crucially, we achieve this very high level abstraction without significant negative impact compared lower-level implementation, introducing any non-standard toolchain components or compilers, by implementing C++ infrastructure top system. This made possible two central API design strategies, form core our contribution. Firstly, gathering much information at compile-time metaprogramming techniques automatically fuse several distinctly formulated processing steps into single accelerator kernel invocation. And secondly, leveraging “Concepts” in order avoid type erasure, allowing highly code generation. have evaluated approach quantitatively comparison manual implementations benchmarks, demonstrating its low overhead. Additionally, investigated individual specific optimizations choices, illustrating advantages afforded Concepts-based approach.
منابع مشابه
Performance-Portable High-Level Accelerator Programming
The OpenMP API provides a portable model for efficient, high level thread-parallel programming across platforms, vendors, operating systems. We are developing a model with the same advantages to address compute accelerators. In this talk, we explore today’s accelerator landscape, along with the perils of current programming methods. We demonstrate why OpenCL, while impressive and important, doe...
متن کاملLegUp: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems
It is generally accepted that a custom hardware implementation of a set of computations will provide superior speed and energy-efficiency relative to a software implementation. However, the cost and difficulty of hardware design is often prohibitive, and consequently, a software approach is used for most applications. In this paper, we introduce a new high-level synthesis tool called LegUp that...
متن کاملOSCAR API v2.1: Extensions for an Advanced Accelerator Control Scheme to a Low-Power Multicore API
The number of cores in smartphones and tablet-PCs are rapidly increasing along with their required high computational power. However, almost all applications on those devices have not used multiple cores for their high speed and low power execution since the application development environments, which allow the application developers easy and prompt development of parallelized application, are ...
متن کاملReusable software components for accelerator-based clusters
The emerging accelerator-based heterogeneous clusters, comprising specialized processors such as the IBM Cell and GPUs, have exhibited excellent price to performance ratio as well as high energy-efficiency. However, developing and maintaining software for such systems is fraught with challenges, especially for modern high-performance computing (HPC) applications that can benefit the most from l...
متن کاملThe Raincore API for Clusters of Networking Elements
T he rapid growth of the Internet over the past several years has focused on speed of deployment while, in many cases, ignoring the requirements of high-performance end-to-end transactions. Given that the Internet is primarily an infrastructure to flow information from where it is stored to where it is requested, we can view the communication path between end points as a chain and each networki...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal of Parallel Programming
سال: 2022
ISSN: ['0885-7458', '1573-7640']
DOI: https://doi.org/10.1007/s10766-022-00731-8